Parallel algorithms for scheduling data-graph computations
نویسنده
چکیده
A data-graph computation — popularized by such programming systems as Pregel, GraphLab, Galois, Ligra, PowerGraph, and GraphChi — is an algorithm that iteratively performs local updates on the vertices of a graph. During each round of a data-graph computation, a user-supplied update function atomically modifies the data associated with a vertex as a function of the vertex’s prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. In this thesis, I explore two ways of scheduling deterministic parallel data-graph computations that provide performance guarantees culminating in theoretical contributions to graph theory and practical, high-performance systems. In particular, I describe a system called Prism which processes dynamic and static data-graph computations on arbitrary graphs using a technique called chromatic scheduling. Using a vertex-coloring to identify independent sets of vertices, which may be safely processed in parallel, Prism serializes through the colors and processes the independent sets in parallel, thus executing data-graph computations deterministically and without the use of costly atomic instructions (e.g., Compare-And-Swap). Prism supports dynamic data-graph computations deterministically and work-efficiently through the introduction of multibag and multivector data structures. Prism requires a vertex-coloring, and since graphs are generally not supplied with one, it is necessary to find one as a preprocessing step. Furthermore, the runtime of Prism is linear in the number of colors and thus motivates a study in this thesis of fast parallel coloring algorithms that provide vertex-colorings with few colors in practice. At the core of the analysis of these coloring algorithms lies a new result about the maximum depth of a random priority dag, the dag that results from randomly ordering vertices and directing edges from lower to higher numbered vertices in the order. In particular, when the largest degree ∆ in the graph G = (V,E) is less than ln |V |, I show a tight bound on the longest path: Θ (lnV/ ln (e lnV/∆)) with high probability. When ∆ is greater than ln |V |, the longest path in the dag is simply Θ ( min {∆, √ E} ) , also with high probability. I also present a system called Laika which processes data-graph computations for the special, but important, case of graphs representing physical simulations. Such graphs typically have vertices with coordinates in 3D space and are connected to other “nearby” vertices. We take advantage of these two properties to execute physical simulations, cast as data-graph computations, that make efficient use of cache resources. I analyze a contrived graph construction — a random cube graph — as a proxy for the mesh graphs that arise in physical simulations: n vertices are uniformly randomly assigned positions in the unit cube and have
منابع مشابه
Chromatic Scheduling of Dynamic Data-Graph Computations
Data-graph computations are a parallel-programming model popularized by programming systems such as Pregel, GraphLab, PowerGraph, and GraphChi. A fundamental issue in parallelizing data-graph computations is the avoidance of races between computation occuring on overlapping regions of the graph. Common solutions such as locking protocols and bulk-synchronous execution often sacrifice performanc...
متن کاملRuntime Data Flow Scheduling of Matrix Computations
We investigate the scheduling of matrix computations expressed as directed acyclic graphs for shared-memory parallelism. Because of the data granularity in this problem domain, even slight variations in load balance or data locality can greatly affect performance. Well-known scheduling algorithms such as work stealing have proven time and space bounds, but these bounds do not provide a discerna...
متن کاملParallel Jobs Scheduling with a Specific Due Date: Asemi-definite Relaxation-based Algorithm
This paper considers a different version of the parallel machines scheduling problem in which the parallel jobs simultaneously requirea pre-specifiedjob-dependent number of machines when being processed.This relaxation departs from one of the classic scheduling assumptions. While the analytical conditions can be easily statedfor some simple models, a graph model approach is required when confli...
متن کاملCombinatorial Parallel and Scientific Computing ∗
Combinatorial algorithms have long played a pivotal enabling role in many applications of parallel computing. Graph algorithms in particular arise in load balancing, scheduling, mapping and many other aspects of the parallelization of irregular applications. These are still active research areas, mostly due to evolving computational techniques and rapidly changing computational platforms. But t...
متن کاملPartitioning Dag Computations: a Cautionary Note
The representation of a parallel computation as a directed acyclic graph can help the programmer to analyse the properties of the computation in order to optimise partitioning and scheduling. We point out that the results obtained from such \optimizations" are only as good as the underlying cost model.
متن کامل